An Apple study shows that large language models (LLMs) can improve performance by using a checklist-based reinforcement learning scheme, similar to a simple productivity trick of checking one's work.
A new study by MIT CSAIL researchers maps the challenges of AI in software development, identifying bottlenecks and highlighting research directions to move the field forward, aiming to allow humans to focus on high-level design while automating routine tasks.
A detailed comparison of the architectures of recent large language models (LLMs) including DeepSeek-V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, and Kimi 2, focusing on key design choices and their impact on performance and efficiency.
Running GenAI models is easy. Scaling them to thousands of users, not so much. This guide details avenues for scaling AI workloads from proofs of concept to production-ready deployments, covering API integration, on-prem deployment considerations, hardware requirements, and tools like vLLM and Nvidia NIMs.
PaperCoder is a multi-agent LLM system that transforms scientific papers into code repositories through a three-stage pipeline: planning, analysis, and code generation. It aims to create faithful, high-quality implementations.
DeepMind researchers propose a new 'streams' approach to AI development, focusing on experiential learning and autonomous interaction with the world, moving beyond the limitations of current large language models and potentially surpassing human intelligence.
Newsweek interview with Yann LeCun, Meta's chief AI scientist, detailing his skepticism of current LLMs and his focus on Joint Embedding Predictive Architecture (JEPA) as the future of AI, emphasizing world modeling and planning capabilities.
This article examines the dual nature of Generative AI in cybersecurity, detailing how it can be exploited by cybercriminals and simultaneously used to enhance defenses. It covers the history of AI, the emergence of GenAI, potential threats, and mitigation strategies.
ByteDance Research has released DAPO (Dynamic Sampling Policy Optimization), an open-source reinforcement learning system for LLMs, aiming to improve reasoning abilities and address reproducibility issues. DAPO includes innovations like Clip-Higher, Dynamic Sampling, Token-level Policy Gradient Loss, and Overlong Reward Shaping, achieving a score of 50 on the AIME 2024 benchmark with the Qwen2.5-32B model.
Scientists are exploring the capabilities of the DeepSeek-R1 AI model, released by a Chinese firm. This open and cost-effective model performs comparably to industry leaders in solving mathematical and scientific problems. Researchers are leveraging its accessibility to create custom models for specific disciplines, although it still struggles with some tasks.